
* This file is a re-replication of Gibler, Miller, and Little's replication of Braithwaite and Lemke (2011). 
* This is the first of two Braithwaite and Lemke (BL) replication files.
* It focuses on replicating how Braithwaite and Lemke's dependent variables were created.
* We could not replicate Braithwaite and Lemke's coding exactly, but we found a method that produces more similar results than GML's.

clear
capture log close
cd "C:\Users\rum842\Dropbox\3.4 MID McManus\ISQ Response\Final Submission\Replication Files" //Insert your own directory here.
log using "Braithwaite and Lemke Coding Replication.log", replace
set more off


****************************************************************************************************************************************

* COMPARE THE DISTRIBUTION OF BL AND GML'S DEPEDENT VARIABLES IN STAGE 2 OF THE REGRESSIONS

* This produces the numbers that are reported in the text.

use "B&L_CMPS.dta", clear

* BL sample
tab mutualforce if onset==1
tab fatal_250 if onset==1
tab war if onset==1

* GML sample
tab gml_mutualforce if gml_onset==1
tab gml_fatal_250 if gml_onset==1
tab gml_war if gml_onset==1


***********************************************************************************************************************************************
 

* REPLICATE HOW BL CREATE THEIR DYADS AND DEPENDENT VARIBLES, USING MID 3.0.

* Begin by figuring out which MIDs were bilateral on the first day.

use "MIDB 3.0.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum)
save "temp2.dta", replace //save this information for later

* Create dyads from MID 3.0

use "MIDB 3.0.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnuma dispnum
save "temp.dta", replace

use "MIDB 3.0.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnumb dispnum

joinby dispnum using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum using "temp2.dta" //file created above
drop if num_orig>2

* Drop joiner dyads - We are retaining MIDs that have joiners (unlike GML), but not retaining the joiner dyads themselves.

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum ccodea ccodeb)

* Merge in necessary variables from MIDA

merge 1:1 dispnum using "MIDA 3.0.dta", keepusing(fatality recip hostlev)

* Code dependent variables

* We experimented with different ways of creating these variables.
* Our goal was to find the method that gave us the closest match to Braithwaite and Lemke in each case.

gen onset_mid3=1

ren recip recip_mid3 //from MIDA
replace recip_mid3=0 if recip_mid3==.

gen force_mid3=(hostlev>3) //from MIDA
replace force_mid3=0 if hostlev==.

gen mut_force_mid3=(hostleva>3 & hostlevb>3) //from MIDB
replace mut_force_mid3=0 if hostleva==. | hostlevb==.

gen fatal_mid3=(fatalitya>0 | fatalityb>0)
// Following BL, we count missing fatalities as greater than 0.

gen high_fatal_mid3=(fatalitya>3 | fatalityb>3)
// Following BL, we count missing fatalities as greater than 250. This is not the best way to construct this variable, but it produces the best match with BL.

gen war_mid3=(fatality>5)
// Following BL, we count missing fatalities as greater than 1,000. This is not the best way to construct this variable, but it produces the best match with BL.

* There should only be one dyad per year in the BL data (although there appear to be a few extras), so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_mid3 recip_mid3 force_mid3 mut_force_mid3 fatal_mid3 high_fatal_mid3 war_mid3, by (dyadid year)

save "BL Dyadic MIDs 3.0.dta", replace 

* Merge into BL original dataset and see how the variables compare.
* Note: This is the BL replication dataset from Gibler's dataverse, but we added the variables ccodea and ccodeb and dropped some estimated variables.

use "B&L_CMPS.dta", clear

* B&L's dyadid variable is sometimes blank, so we fill it in.

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs 3.0.dta"
drop if _merge==2
drop _merge
replace onset_mid3=0 if onset_mid3==.

* Check how close the matches are

tab onset onset_mid3
tab recipro recip_mid3
tab force force_mid3
tab fatal_0 fatal_mid3
tab fatal_250 high_fatal_mid3
tab war war_mid3

* They are never an exact match, but pretty close.


************************************************************************************************************************************************


* REPLICATE HOW GML CREATED THEIR ONSET VARIABLE

* Begin by figuring out which MIDs are ever multilateral.

use "GML MIDB ISQ.dta", clear
gen num_partic=1
collapse (sum) num_partic, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "GML MIDB ISQ", clear
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "GML MIDB ISQ", clear
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral for its entire history - this is the major difference between BL's method and GML's method.

merge m:1 dispnum3 using "temp2.dta"
drop if num_partic>2

* Drop joiner dyads - GML also seem to do this slightly differently from BL, but it only changes the coding of one observation.

drop if origa==0 & origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Code dependent variable of interest

gen rep_gml_onset=1

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) rep_gml_onset, by (dyadid year)

save "BL Dyadic MIDs 3.0 with GML Onset Coding.dta", replace 

* Merge into BL dataset and see how the variables compare.

use "B&L_CMPS.dta", clear

* B&L's dyad id variable is sometimes blank, so we fill it in.

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs 3.0 with GML Onset Coding.dta"
drop if _merge==2
drop _merge
replace rep_gml_onset=0 if rep_gml_onset==.

* Check how close the matches are

tab gml_onset rep_gml_onset
* There are only two observations different, so we seem to have almost exactly replicated GML's method of coding onset.

tab gml_onset onset
* There are 368 differences GMLS's and BL's onset variable. Below we show that many of these differences are due to different coding methods, not differences in the underyling data.



************************************************************************************************************************************


* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING GML'S ORIGINAL DATA AND BL'S METHOD

* Begin by figuring out which MIDs were bilateral on the first day.

use "GML MIDB ISQ.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "GML MIDB ISQ.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "GML MIDB ISQ.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum3 using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Merge in necessary variables from MIDA

merge 1:1 dispnum3 using "GML MIDA ISQ.dta", keepusing(fatality recip hostlev)

* Code dependent variables

gen onset_gml1=1

ren recip recip_gml1
replace recip_gml1=0 if recip_gml1==.

gen force_gml1=(hostlev>3)
replace force_gml1=0 if hostlev==.

gen mut_force_gml1=(hostleva>3 & hostlevb>3)
replace mut_force_gml1=0 if hostleva==. | hostlevb==.

gen fatal_gml1=(fatalitya>0 | fatalityb>0)

gen high_fatal_gml1=(fatalitya>3 | fatalityb>3)

gen war_gml1=(fatality>5)

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_gml1 recip_gml1 force_gml1 mut_force_gml1 fatal_gml1 high_fatal_gml1 war_gml1, by (dyadid year)

save "BL Dyadic MIDs GML ISQ.dta", replace 

* Merge into BL original dataset and see how the variables compare.

use "B&L_CMPS.dta", clear

* B&L's dyad id variable is sometimes blank, so we fill it in.

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs GML ISQ.dta"
drop if _merge==2
drop _merge
replace onset_gml1=0 if onset_gml1==.

tab onset onset_gml1
* Now there are only 264 differences with BL's onset variable.
* In other words, using a method of creating the onset variable that is more similar to BL's method produces a more similar coding, despite using the GML data.



********************************************************************************************************************************************


* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING GML'S 2.1 DATA AND BL'S METHOD

* In addition to replicating BL's coding method with the GML data used in the ISQ article, we also replicate it with the GML 2.1 data - the latest version.

* Begin by figuring out which MIDs were bilateral on the first day.

use "GML MIDB 2.1.dta", clear
drop if sidea=="NA" //There is just one observation in this dataset for which several variables have the value of NA.
destring orig, replace //The NA codings make many of the variables strings.
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum)
save "temp2.dta", replace

* Create dyads from GML 2.1

use "GML MIDB 2.1.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea=="NA"
destring orig, replace
destring sidea, replace
destring hostlev, replace
destring fatality, replace
drop if sidea==0
ren * *a
ren dispnuma dispnum
save "temp.dta", replace

use "GML MIDB 2.1.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea=="NA"
destring orig, replace
destring sidea, replace
destring hostlev, replace
destring fatality, replace
drop if sidea==1
ren * *b
ren dispnumb dispnum

joinby dispnum using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum ccodea ccodeb)

* Merge in necessary variables from MIDA

ren dispnum dispnum3 //This variable has different names across GML MIDA 2.1 and GML MIDB 2.1.
merge 1:1 dispnum3 using "GML MIDA 2.1.dta", keepusing(fatality recip hostlev)

* Code dependent variables

gen onset_gml2=1

ren recip recip_gml2
replace recip_gml2=0 if recip_gml2==.

gen force_gml2=(hostlev>3)
replace force_gml2=0 if hostlev==.

gen mut_force_gml2=(hostleva>3 & hostlevb>3)
replace mut_force_gml2=0 if hostleva==. | hostlevb==.

gen fatal_gml2=(fatalitya>0 | fatalityb>0)

gen high_fatal_gml2=(fatalitya>3 | fatalityb>3)

gen war_gml2=(fatality>5)

* There should only be dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_gml2 recip_gml2 force_gml2 mut_force_gml2 fatal_gml2 high_fatal_gml2 war_gml2, by (dyadid year)

save "BL Dyadic MIDs GML 2.1.dta", replace 
* This will be used for later analysis.


********************************************************************************************************************************************


* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING THE MID 4.3 DATA AND BL'S METHOD

* Begin by figuring out which MIDs were bilateral on the first day.

use "MIDB 4.3.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "MIDB 4.3.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "MIDB 4.3.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum3 using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Merge in necessary variables from MIDA

merge 1:1 dispnum3 using "MIDA 4.3.dta", keepusing(fatality recip hostlev)

* Code dependent variables

gen onset_mid4=1

ren recip recip_mid4
replace recip_mid4=0 if recip_mid4==.

gen force_mid4=(hostlev>3)
replace force_mid4=0 if hostlev==.

gen mut_force_mid4=(hostleva>3 & hostlevb>3)
replace mut_force_mid4=0 if hostleva==. | hostlevb==.

gen fatal_mid4=(fatalitya>0 | fatalityb>0)

gen high_fatal_mid4=(fatalitya>3 | fatalityb>3)

gen war_mid4=(fatality>5)

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_mid4 recip_mid4 force_mid4 mut_force_mid4 fatal_mid4 high_fatal_mid4 war_mid4, by (dyadid year)

save "BL Dyadic MIDs 4.3.dta", replace 
* This will be used for later analysis.
